Užkrauname reikalingas bibliotekas
library(tidyverse)
library(knitr)
Generuojant ataskaitą galima failo neskaityti kiekvieną kartą iš naujo - cache=TRUE. Nenorint klaidų/informacinių pranešimų pridedame message=FALSE ir warning=FALSE.
df <- read_csv("../../../project/1-data/1-sample_data.csv")
Duomenų failo dimensijos:
dim(df)
## [1] 1000000 9
(dėl gražesnio spaudinimo, naudojame funkciją kable() ir išdaliname kintamuosius į kelias eilutes)
summary(df)
## id y amount_current_loan term
## Min. : 1 Min. :0.0 Min. : 10802 Length:1000000
## 1st Qu.: 250001 1st Qu.:0.0 1st Qu.:174394 Class :character
## Median : 500000 Median :0.5 Median :269676 Mode :character
## Mean : 500000 Mean :0.5 Mean :316659
## 3rd Qu.: 750000 3rd Qu.:1.0 3rd Qu.:435160
## Max. :1000000 Max. :1.0 Max. :789250
##
## credit_score loan_purpose yearly_income home_ownership
## Length:1000000 Length:1000000 Min. : 76627 Length:1000000
## Class :character Class :character 1st Qu.: 825797 Class :character
## Mode :character Mode :character Median : 1148550 Mode :character
## Mean : 1344805
## 3rd Qu.: 1605899
## Max. :165557393
## NA's :219439
## bankruptcies
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1192
## 3rd Qu.:0.0000
## Max. :7.0000
## NA's :1805
Galutinėje ataskaitoje galime neįtraukti R kodo, naudojant echo=FALSE parametrą.
| id | y | amount_current_loan | term | credit_score | loan_purpose | yearly_income | home_ownership | bankruptcies | |
|---|---|---|---|---|---|---|---|---|---|
| Min. : 1 | Min. :0.0 | Min. : 10802 | Length:1000000 | Length:1000000 | Length:1000000 | Min. : 76627 | Length:1000000 | Min. :0.0000 | |
| 1st Qu.: 250001 | 1st Qu.:0.0 | 1st Qu.:174394 | Class :character | Class :character | Class :character | 1st Qu.: 825797 | Class :character | 1st Qu.:0.0000 | |
| Median : 500000 | Median :0.5 | Median :269676 | Mode :character | Mode :character | Mode :character | Median : 1148550 | Mode :character | Median :0.0000 | |
| Mean : 500000 | Mean :0.5 | Mean :316659 | NA | NA | NA | Mean : 1344805 | NA | Mean :0.1192 | |
| 3rd Qu.: 750000 | 3rd Qu.:1.0 | 3rd Qu.:435160 | NA | NA | NA | 3rd Qu.: 1605899 | NA | 3rd Qu.:0.0000 | |
| Max. :1000000 | Max. :1.0 | Max. :789250 | NA | NA | NA | Max. :165557393 | NA | Max. :7.0000 | |
| NA | NA | NA | NA | NA | NA | NA’s :219439 | NA | NA’s :1805 |
Apžvelgti NA reikšmes, y pasiskirstymą, character tipo kintamuosius panagrinėti detaliau.
df$loan_purpose <- as.factor(df$loan_purpose)
df$y <- as.factor(df$y)
summary(df$loan_purpose) %>%
kable()
| x | |
|---|---|
| business_loan | 17756 |
| buy_a_car | 11855 |
| buy_house | 6897 |
| debt_consolidation | 785428 |
| educational_expenses | 992 |
| home_improvements | 57517 |
| major_purchase | 3727 |
| medical_bills | 11521 |
| moving | 1548 |
| other | 91481 |
| renewable_energy | 109 |
| small_business | 3242 |
| take_a_trip | 5632 |
| vacation | 1166 |
| wedding | 1129 |
Arba:
df %>%
group_by(loan_purpose) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
kable()
| loan_purpose | n |
|---|---|
| debt_consolidation | 785428 |
| other | 91481 |
| home_improvements | 57517 |
| business_loan | 17756 |
| buy_a_car | 11855 |
| medical_bills | 11521 |
| buy_house | 6897 |
| take_a_trip | 5632 |
| major_purchase | 3727 |
| small_business | 3242 |
| moving | 1548 |
| vacation | 1166 |
| wedding | 1129 |
| educational_expenses | 992 |
| renewable_energy | 109 |
Pasirinkus kintamuosius juos vizualizuokite
df %>%
group_by(y, loan_purpose) %>%
summarise(n = n()) %>%
ggplot(aes(fill=y, y=n, x=loan_purpose)) +
geom_bar(position="dodge", stat="identity") +
coord_flip() +
scale_y_continuous(labels = scales::comma) +
theme_dark()
Daugiausiai banktotų imant paskolą šiems tikslams:
df %>%
filter(y == 1) %>%
group_by(loan_purpose) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
head(10) %>%
kable()
| loan_purpose | n |
|---|---|
| debt_consolidation | 391875 |
| other | 44888 |
| home_improvements | 27274 |
| business_loan | 10356 |
| medical_bills | 6286 |
| buy_a_car | 5810 |
| buy_house | 3652 |
| take_a_trip | 2870 |
| small_business | 2152 |
| major_purchase | 2120 |
Interaktyvios lentelės su datatable (DT)
library(DT)
df %>%
group_by(y, loan_purpose) %>%
summarise(n = n()) %>%
datatable()
Interaktyvūs grafikai su plotly
library(plotly)
df %>%
group_by(y, credit_score) %>%
summarise(n = n()) %>%
plot_ly(x = ~credit_score, y = ~n, name = ~y, type = "bar")